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Claims 

1 1 . A method for assembly of a plurality of reads from a genomic region, the method 

2 comprising the steps of: 

3 providing a plurality of reads from a genomic region; 

4 indexing a plurality of read subsequences for each of the plurality of reads, each 

5 subsequence having an associated read with which it corresponds; 

6 extracting from the indexed subsequences a plurality of read pairs that have a 

7 predetermined number of subsequences in common; and 

8 merging the read pairs along a continuum. 

1 2. The method of claim 1 , comprising the step of providing a plurality of reads generated 

2 from sequencing both ends of a plurality of DNA segments, each read having associated linking 
information comprising an associated orientation relative to a read from an opposite end of the 

■^4 DNA segment, and an associated distance from the read on the opposite end of the DNA 

a|5 segment. 

'■p 3. The method of claim 1, comprising the step of providing a plurality reads that are reverse 

ItSS 

t$h complements of a plurality of reads provided by sequencing both ends of a plurality of DNA 

., 3 segments. 

'i^fl 4. The method of claim 1, comprising the step of sorting the indexed read subsequences. 

Ml 5. The method of claim 1 , comprising the step of discarding read subsequences having more 

Jl^2 than a cutoff number of occurrences from the plurality of indexed subsequences. 

Ml 6. The method of claim 1 , comprising the step of indexing a plurality of read subsequences 

2 of a predetermined length for each of the plurality of reads. 

1 7. The method of claim 6, wherein the predetermined length for each of the plurality of 

2 reads is between about 12 and about 32 bases long. 

1 8. The method of claim 1, comprising the step of indexing a plurality of subsequences for 

2 each read, the index comprising for each subsequence an associated read and an associated 

3 position on the read with which it corresponds. 

1 9. The method of claim 1, comprising the step of performing alignments on the plurality of 

2 read pairs having a predetermined number of subsequences in common. 
I 10. The method of claim 8, comprising the steps of: 
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2 performing alignments on the plurality of read pairs having a predetermined number of 

3 subsequences in common; and 

4 using the associated position on the reads with which the subsequences correspond to 

5 verify overlap. 

1 11. The method of claim 2, comprising the step of using the linking information associated 

2 with the reads to confirm that the merged pairs are merged correctly. 

1 12. The method of claim 2, comprising the step of using the associated linking information to 

2 an ambiguity in the merged reads. 

1 13. The method of claim 1 2, comprising the step of identifying a repeat region and a set of 

2 unique regions. 

1 14. The method of claim 13, comprising the step of linking pairs of unique regions using the 

2 linking information associated with the reads in the unique regions. 

1 15. The method of claim 14, comprising the step of inserting the repeat region between each 

2 linked pair of unique regions with which the repeat region corresponds. 

1 16. The method of claim 13, comprising the step of merging linked pairs of unique regions 

2 using the linking information associated with the reads in the unique regions. 

1 17. A method for assembly of merged reads from a genomic region, the method comprising 

2 the steps of: 

3 providing one or more sets of merged reads from a genomic region comprising a set of 

4 reads having associated linking information; 

5 using the associated linking information to identify an ambiguity in the merged reads; 

6 identifying a repeat region and a set of unique regions; and 

7 linking pairs of unique regions using the linking information associated with the reads in 

8 the unique regions. 

1 18. The method of claim 17, comprising the step of inserting the repeat region between each 

2 linked pair of unique regions with which the repeat region corresponds. 

1 19. The method of claim 17, comprising the step of merging linked pairs of unique regions 

2 using the linking information associated with the reads in the unique regions. 

1 20. An article of manufacture having computer-readable program means embodied thereon 

2 for assembly of a plurality of reads from a genomic region, the article comprising: 

3 computer-readable program means for providing a plurality of reads from a genomic 
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4 region; 

5 computer-readable program means for indexing a plurality of read subsequences for each 

6 of the plurality of reads, each subsequence having an associated read with which it corresponds; 

7 computer-readable program means for extracting from the indexed subsequences a 

8 plurality of read pairs that have a predetermined number of subsequences in common; and 

9 computer-readable program means for merging the read pairs along a continuum. 

1 2 1 . An article of manufacture having computer-readable program means embodied 

2 thereon for assembly of merged reads from a genomic region, the article comprising: 

3 computer-readable program means for providing one or more sets of merged reads from a 

4 genomic region comprising a set of reads having associated linking information; 

5 computer-readable program means for using the associated linking information to 

>SM 6 identify one or more ambiguities in the merged reads; 

Lb* 

ij7 computer-readable program means for identifying a repeat region and a set of unique 

fn 

r =i=8 regions; and 

W9 computer-readable program means for linking pairs of unique regions using the linking 



010 information associated with the reads in the unique regions. 
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